Since its earliest days, harassment and abuse have plagued the Internet. Recent research has focused on in-domain methods to detect abusive content and faces several challenges, most notably the need to obtain large training corpora. In this paper, we introduce a novel computational approach to address this problem called Bag of Communities (BoC) - a technique that leverages large-scale, preexisting data from other Internet communities. We then apply BoC toward identifying abusive behavior within a major Internet community. Specifically, we compute a post's similarity to 9 other communities from 4chan, Reddit, Voat and MetaFilter. We show that a BoC model can be used on communities "off the shelf" with roughly 75% accuracy - no training examples are needed from the target community. A dynamic BoC model achieves 91.18% accuracy after seeing 100, 000 human-moderated posts, and uniformly outperforms in-domain methods. Using this conceptual and empirical work, we argue that the BoC approach may allow communities to deal with a range of common problems, like abusive behavior, faster and with fewer engineering resources.

The bag of communities: Identifying abusive behavior online with preexisting internet data / Chandrasekharan, E.; Samory, M.; Srinivasan, A.; Gilbert, E.. - 2017-:(2017), pp. 3175-3187. (Intervento presentato al convegno 2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017 tenutosi a Denver, CO, USA) [10.1145/3025453.3026018].

The bag of communities: Identifying abusive behavior online with preexisting internet data

Samory M.;
2017

Abstract

Since its earliest days, harassment and abuse have plagued the Internet. Recent research has focused on in-domain methods to detect abusive content and faces several challenges, most notably the need to obtain large training corpora. In this paper, we introduce a novel computational approach to address this problem called Bag of Communities (BoC) - a technique that leverages large-scale, preexisting data from other Internet communities. We then apply BoC toward identifying abusive behavior within a major Internet community. Specifically, we compute a post's similarity to 9 other communities from 4chan, Reddit, Voat and MetaFilter. We show that a BoC model can be used on communities "off the shelf" with roughly 75% accuracy - no training examples are needed from the target community. A dynamic BoC model achieves 91.18% accuracy after seeing 100, 000 human-moderated posts, and uniformly outperforms in-domain methods. Using this conceptual and empirical work, we argue that the BoC approach may allow communities to deal with a range of common problems, like abusive behavior, faster and with fewer engineering resources.
2017
2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017
Abusive behavior; Machine learning; Moderation; Online communities; Social computing
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
The bag of communities: Identifying abusive behavior online with preexisting internet data / Chandrasekharan, E.; Samory, M.; Srinivasan, A.; Gilbert, E.. - 2017-:(2017), pp. 3175-3187. (Intervento presentato al convegno 2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017 tenutosi a Denver, CO, USA) [10.1145/3025453.3026018].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1655754
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 92
  • ???jsp.display-item.citation.isi??? 52
social impact